Skip to content

BerkeleyDB to Redis migration#145

Open
christophe-pietquin wants to merge 1 commit intoyasserg:masterfrom
christophe-pietquin:redis
Open

BerkeleyDB to Redis migration#145
christophe-pietquin wants to merge 1 commit intoyasserg:masterfrom
christophe-pietquin:redis

Conversation

@christophe-pietquin
Copy link
Copy Markdown

This pull request is a simple implementation of the frontier component using Redis instead of BerkeleyDB if there's some interest in crawler4j backed by Redis

@Chaiavi
Copy link
Copy Markdown
Contributor

Chaiavi commented Jul 15, 2016

Let me understand as I don't know redis.

For this to work, should I install a Redis server somewhere ?
Why did you change to Redis in the first place ? - did you encounter any problem with the current internal DB ? - did Redis perform better in terms of memory or something ?

@christophe-pietquin
Copy link
Copy Markdown
Author

Right, you need to start a Redis server.
No issues detected with the internal DB, the idea is not really to move crawler4j to Redis but to initiate support of multiple datastore. (I have a client with existing infrastructure requiring to use internal tools like Redis)

@Chaiavi
Copy link
Copy Markdown
Contributor

Chaiavi commented Jul 15, 2016

In this form it can't be integrated into the core of crawler4j as we aim
crawler4j for the simple users which don't want to fuss with it, just to
have an instant tool for crawling.

For the more advanced users which have many demands there are several other
crawlers out there like solr-lucene, elastic search, storm crawler and lots
of others.

But this is Yasser's call, which might find it useful to integrate a redis
DB if it was optional, and the default remained as it is - but again this
is Yasser's call.

Anyway, thank you for your contribution, I am sure that even as it is many
will do find it useful

On Fri, Jul 15, 2016 at 6:19 PM, christophe-pietquin <
notifications@github.com> wrote:

Right, you need to start a Redis server.
No issues detected with the internal DB, the idea is not really to move
crawler4j to Redis but to initiate support of multiple datastore. (I have a
client with existing infrastructure requiring to use internal tools like
Redis)


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#145 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABrbW1RVrPryCbHklLEjYqfdnheY1PxTks5qV6T1gaJpZM4JNcpn
.

@manojchandar
Copy link
Copy Markdown

@Chaiavi Am little curious to know, that BerkeleyDB falls under GPL license, where in crawler4j is Apache 2.0. Please help me in understanding usage of it.

@Chaiavi
Copy link
Copy Markdown
Contributor

Chaiavi commented Nov 30, 2016 via email

@s17t s17t mentioned this pull request Nov 30, 2016
@rzo1
Copy link
Copy Markdown
Contributor

rzo1 commented Dec 15, 2016

Maybe it would be beneficial to provide interfaces for the db-specific classes and then provide the implementation via separate maven modules in order to allow quick exchange of the implementation to use.

Any ideas @s17t @yasserg @Chaiavi ?

@s17t
Copy link
Copy Markdown
Contributor

s17t commented Dec 15, 2016

Maybe it would be beneficial to provide interfaces for the db-specific

@rzo1 this is the way to go. I would like to drain the pull requests queue a little bit more and write some test around the current Frontier part before that. I've added stuffs that should make easier to write test (see WireMock, Spock and the Groovy compiler for the src/test/groovy folder).

@rzo1
Copy link
Copy Markdown
Contributor

rzo1 commented Dec 16, 2016

@s17t Yes. We should indeed go for a better test-coverage before working on this core part.

@Chaiavi
Copy link
Copy Markdown
Contributor

Chaiavi commented Dec 16, 2016 via email

@s17t s17t added this to the 4.4.0 milestone Mar 30, 2017
@s17t s17t mentioned this pull request May 22, 2017
s17t added a commit to s17t/crawler4j that referenced this pull request Mar 1, 2018
- no need for core classes to extends Configurable, good old IoC is better
- deprecated Configurable
s17t added a commit that referenced this pull request Mar 1, 2018
@s17t s17t modified the milestones: 4.4.0, 4.5.0 Mar 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants